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Summary 

The ability of feed-forward neural network architectures to 
learn continuous-valued mappings in the presence of noise was 
demonstrated in relation to parameter identification and real- 
time adaptive control applications. An error function was 
introduced to help optimize parameter values such as number 
of training iterations, observation time, sampling rate, and 
scaling of the control signal. The learning performance 
depended essentially on the degree of embodiment of the 
control law in the training data set and on the degree of 
uniformity of the probability distribution function of the data 
that are presented to the net during a training sequence. When 
a control law was corrupted by noise, the fluctuations of the 
training data biased the probability distribution function of the 
training data sequence. Only if the noise contamination is 
minimized and the degree of embodiment of the control law 
is maximized, can the neural net develop a good internal 
representation of the mapping and be used as a neurocontroller. 
A multilayer net was trained with back-error-propagation to 
control a cart-pole system for linear and nonlinear control laws 
in the presence of data processing noise and measurement 
noise. The neurocontroller exhibited noise-filtering properties 
and was found to operate more smoothly than the teacher in 
the presence of measurement noise. 


Introduction 

A major challenge for intelligent control (ref. 1) of complex 
Advanced Propulsion Systems (APS), such as the Space Shuttle 
Main Engine, is the real-time analysis of a massive amount 
of diverse sensor data. Such analysis can be used to directly 
perform low-level, real-time adaptive control, to diagnose 
faults, or to send real-time descriptions of the dynamic state 
of the APS to a high-level controller. In the first case, the low- 
level controller has to compute the control signal adaptively 
and in real time so that it can be applied to the controlled 
process for a given set of sensor data. In the second case, the 
sensor information must be translated in real time into one 
of several parameters which characterize specific failure modes 
of the APS. In the third case, the dynamic state needs to be 
identified in real time in order to allow an accurate health 
condition monitoring of the APS. 

In most instances however, it is difficult, if not impossible, 
to derive realistic models of the physical phenomena and 


feedback mechanisms which govern the evolution of systems 
as intricately complex as APS, and the only information 
available often consists of experimental data collected in flight 
or during ground tests. Moreover, the presence of noise in 
such systems makes it even more difficult to extract the 
information contained in the experimental data and to perform 
accurate fault diagnosis and condition monitoring. 

It is a major asset of neural networks to be able to extract 
features from finite sets of input and output data which 
are representative of arbitrary, unknown continuous-valued 
mappings (refs. 2 to 4). As arrays of simple computing 
elements, neural networks are easy to implement, and benefit 
from attractive real-time processing capabilities due to their 
massive parallelism (refs. 5 and 6). They store the extracted 
features in the distributed network of their interconnections, 
which gives them the fault tolerance desired in hostile or 
remote environments. Such cost-performance advantages make 
neural networks well fitted for the data processing of fault 
diagnosis and conditioning monitoring of the APS. 

This report analyzes the ability of neural networks to learn 
continuous mappings and serve as parameter identifiers or real- 
time adaptive controllers when the data used for training have 
been corrupted by noise during sensor measurements and/or 
off-site data transmission. In the case of adaptive control, the 
noise incurred through sensor measurements corrupts the 
actual values of the state variables (as well as the control signal 
given by the actuator), and it alters the dynamic evolution that 
the controlled process would have had otherwise. This will 
be called plant or measurement noise. On the other hand, noise 
incurred during the (analog) processing of the sensor data only 
corrupts the description of the dynamic evolution of the 
controlled process. This will be called data processing noise 
and will be analyzed first for simplicity. 

In the section Training Architecture With Data Processing 
Noise, a training architecture is proposed to analyze the 
possibility of learning from a teacher-controller in the presence 
of data processing noise. In the section Controlled Cart-Pole 
System, this training architecture is computer simulated on 
the cart-pole system for linear and nonlinear control laws. In 
both cases, the training sequences are analyzed in detail with 
noise-free and noise-corrupted training data. In the section 
Example of Neuromorphic Learning of Nonlinear Control 
With Measurement Noise and Data Processing Noise, the 
results are applied to the most general situation where the 
data representing the dynamics of the controlled process are 
corrupted with both types of noise. 


l 


Training Architecture With Data 
Processing Noise 

In order to identify the factors and parameters which 
influence the neuromorphic learning of continuous-valued 
mappings from noise-corrupted data, it is simpler to consider 
first the effect of data processing noise. The continuous-valued 
mapping is chosen to be a control law: that is, a mapping from 
the space of state variables onto the space of control signals. 
As mentioned in the Introduction, such a mapping can 
be viewed from the perspective of identifying a collective 
parameter associated with several sensor data, or from the 
perspective of real-time adaptive control. Whereas the first 
case would apply to situations of fault diagnosis and component 
degradation, the second case would apply to situations where 
a neurocontroller would be used in place of a human teacher, 
a rule-based automated expert, or a natural servomechanism. 

In the presence of data processing noise, the state Z of the 
controlled process and the applied control signal C which are 
transmitted to the neural net are 

Z = Z ex + hz 
and 

C=C ex + n c 




Figure I.— Training architecture for neuromorphic learning with noisy 
data processing . 


ance of th e net a s a function of the signal-to-noise ratio 
maxjC M | /V var(n) of the output signal only. 

The first phase of the training consisted of sampling, at 
various times t k , the state Z ex (t k ) of the controlled process 
(fig. 1) and the control signal C n (t k ) given inequation (3). 
In the second phase, the set of training data \Z ex {t k ), C n (t k )\ 
was organized in input-output subsets before being applied to 
the neuromorphic controller as described below. 


Controlled Cart-Pole System 


where the noises nf and n c are simulated as independent, 
normally distributed, zero mean processes. For a high signal- 
to-noise ratio of the input signal; that is, 


max Z ex \\ 

—r 1 >> 1 

v var( nz) 

the function 0 that represents the control law <f>(Z €X ) = Q* can 
be approximated, by using a Taylor expansion, as 


0(Z)~C,,+ nf4 

dZ 


(Z = z e , 


(2) 


When the noise-to-signal ratio remains small, trying to learn 
the mapping 0 from sets of noisy input and output vectors 
(Z, C) is equivalent to trying to learn 0 from sets of input and 
output vectors (Z ex , C„) where only the control force is 
corrupted by the effective noise: 


C„ = C ex + h 


(3) 


where ft is the independent, normally distributed, zero 
mean process 


- * * d 0 

n ~ n c — n% — 

dZ 


(z = L) 


(4) 


When the input and output data transmitted to the net are 
corrupted by noise, the factors and parameters which influence 
learning can be studied by analyzing the learning perform- 


The controlled process of figure 1 was chosen to be the cart- 
pole system (refs. 4, 7, and 8) represented in figure 2. Training 
data were recorded by placing the cart pole at arbitrary initial 
positions fX(0), 0(0)] with zero velocities and by driving it 
to the origin (X = 0, 6 = 0) with a control force. While the 
cart pole was returned tojts equilibrium position, the four- 
dimensional state vector Z(t) = [X(t), X(t), 6(t), 0(t)] and 
the control force F(t) were regularly sampled over a certain 
period of time. Sampling rate and observation time were 
considered to be parameters of the training. In the first part 
of this section, a method based on the simple example of a 



Figure 2.— Cart-pole system. The control force is applied to the cart in the 
presence of friction. Mass of cart, M y 1 kg; mass of pole, m, 0. 1 kg; distance 
between base of pole and center of gravity of pole, L, 1 m; friction force 
applied to cart, 5 kg/sec; acceleration due to gravity, g, 9.81 m/sec 2 . 
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linear control law is derived to optimize learning from noise- 
corrupted data. In the second part, it is applied and tested on 
a nonlinear control law. 


Linear Control Law 

For small deviations of the rod around its equilibrium 
position 0 = 0, the dynamics of the cart pole can be linearized 
and are given by 


X = ^ [ F (Z) -/X] 


0 = — (go - x) 

AL 


A linear controller that stabilizes this dynamic system to the 
origin (X = 0, 0 = 0) is (ref. 4) 

F(Z) = k x X + k x X + k e d + k e 6 (6) 

where the coefficients are k x = 11.01, k x = 19.68, 
k d = 96.49, and k e = 35.57. 

Throughout this work, the term “neuron’ is used to 
represent a simple processing element whose input and output 
response curve is a sigmoid that can be modeled as 


output = 


1 + exp(-input) 


Although the output of a neuron can take any value in the 
interval [0, 1], learning performance is known to be enhanced 
(ref. 9) when the asymptotes of the activation function given 
in equation (7) are eliminated by restricting the information 


Figure 3.— Perceptron architecture to learn the linear control law j 
e = Vi (o„ - d n ) 2 \ o n = [1.25 + F(Z)/F 0 | /2.5 


domain of a neuron output to the interval [0.1, 0.9]. Thus, 
the neuromorphic learning of the continuous-valued mapping 
F(Z) requires the scaling and offsetting of the last layer output 
On- 

u = - = 2.5o n - 1.25 (8) 

Fo 

where F 0 is a constant parameter that normalizes the control 
signal over [-1 , + 1], It is essential to emphasize that the choice 
of F 0 defines the domain where the mapping is to be learned, 
and it influences the neural computation as well. Equations 
(7) and (8) show that the net output cannot match values of 
Fsuch that \F\ > 1.25F 0 (practically, the net can only match 
accurately the domain F < Fq). When a control law or a 
continuous-valued mapping is to be learned, it is imperative 
to first determine the range of variations, F max , of the control 
signal before F 0 is chosen. The value of Fq that satisfies 
Fq > \F m a XI and corresponds to the best approximation of the 
continuous-valued mapping by the neural net can be obtained 
by minimizing an error function. 

The neuromorphic architecture of figure 3 consists of a 
single neuron in the output layer and four fan-out units in the 
input layer. Each input unit feeds into the net one component 
of the state vector Z = (X, X, 9, 6). Like the chemical voltage 
of a biological neuron is modulated by its synaptic connections 
before it contributes to the excitation or inhibition of another 
neuron, each signal of the input layer is subsequently 
modulated by a weight w and contributes to the total input 
signal of the output neuron: 

i net = w x X + w x X + w e 6 + WgO + w, h (9) 
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where w (h represents the “synaptic” weight connecting the 
output neuron to an input neuron that is permanently “on” 
(i.e., with an output signal of + 1). With the neuron activation 
function given in equation (7), w th represents the threshold of 
external excitation below which the neuron is inhibited and 
above which it is activated. 

Whether the architecture of figure 3 can learn the linear 
control law given in equation (6) is subject to the possibility 
of finding a set of weights w>x, w 6 , w e , and w th that 
satisfies the condition 


F 

1.25 + — 
F 0 


1 


2.5 I+exp(-/ net ) 


( 10 ) 


over the domain of variations of the control force F. For 
\F\/F q < 1, the Taylor expansion of equation (10) around the 
origin leads to the condition 




( 11 ) 


which is equivalent to 


w th = 0 


and 


1.6 / F \ 

w x = — k x + Ol — with X = X, X, 0, 6 


(12) 


The above equations are formally satisfied over a finite domain 
of variations of the control force F only if the limit F 0 — oo, 
where O(F/F 0 ) — oo, is taken. However, as will be demon- 
trated in the next subsection, finite values of F 0 lead to aj>ood 
approximation of the weights that map the pairs [F(Z), Z] 
for \F\ < F 0 . 

The results presented hereafter were obtained by imple- 
menting the back-error-propagation (BEP) algorithm on a 
VAX 8800 at the NASA Lewis Research Center. The initial 
values of weights and thresholds were chosen to be randomly 
distributed in the interval [-0.1, +0.1] to break symmetries 
that could eventually lead to spurious modes and bias the 
learning. 

Learning from noise-free data .— In the absence of noise, 
learning is accomplished by training the network with a 
training data set. If the training data set covers only a limited 
region Si of the state space S 0 of the control law (i.e., 
S[ C S 0 ), the neural net can, in general, only extract the 
input/output relation over S it cannot generalize it over S 0 . 
Consequently, the control law would be only partially 
embodied in the training data set S {y and the neural net would 
only approximate the control law over S,. In addition, the 


accuracy of the neural approximation of the control law over 
S] depends on how uniformly the available data can be 
ordered by magnitude throughout the dimensions of S t . The 
ordering of the training data has to be uniform to prevent the 
net from focusing on any part of S Y to the exclusion of the 
remainder. 

One way to obtain a high degree of embodiment of the 
control law in the training data set is to observe many responses 
of the cart pole to random displacements from its equilibrium 
position. The degree of embodiment of the control law in the 
training data set grows (and subsequently converges) with the 
number of such motions Actions* the length of observation 
time F, and the sampling rate f s . The values of /V motions , F, 
and f S9 leading to a sufficiently representative training data set 
depend on the application, and, in general, have to be deter- 
mined numerically. For this analysis, the pool of all cart-pole 
responses is called the training data set and consists of 
^motions 77s data points. Since the state of the cart pole was 
sampled at regular, fixed intervals until it returned to the 
origin, the distribution of the forces F used for training the 
net (as shown by a histogram of the number of occurrences 
versus magnitude, e.g.) was peaked around the origin. The 
distribution became more peaked as the observation time T 
increased. Clearly a random ordering of the training data 
would not be uniform and would bias the training to the origin, 
thus preventing the net from learning the control law for large 
displacements of the cart pole. Before training began, data 
were organized by dividing the interval [-1, +1] of 
the normalized force u = FIF 0 into N Fq subintervals 
hfi = I? ^f 0 ) of equal size. An approximately uniform 
ordering of the training data set could then be obtained from 
the random sampling of an interval I k followed by the random 
sampling of the normalized force u e I k . Once selected, u and 
its corresponding stat tZ were then presented to the net as 
training data. Each («, Z) pair presented to the net represented 
the information required for one update, or training iteration, 
of the network weights. The sequence of all pairs presented 
to the net is called a training data sequence. The update 
procedure is given in the next paragraph. 

For a randomly selected input state vector, the resulting 
network output d n was compared with the target output 
o n = (w + 1.25)/2.5. At each training iteration, the error 
£ = V2{o n - o n } 2 between the target output o n and the network 
output o n was back-propagated through all the net layers to 
update weights and thresholds by a steepest descent mini- 
mization of £. The BEP update of the network was iterated 
until convergence. For a single-layer, feed-forward net, the 
changes 5w in) of the weights at the M th iteration were 

Sw in) = -a 1- (3dw in l} (13) 

5w 

In this equation, the first term is directly proportional to the 
gradient of the error, and the second term (or momentum term) 
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modulates the steepest descent update. For the perceptron of 
figure 3, weights and thresholds were updated by 

6wf (n) = a(o n - o n )o n ( 1 - d n )Z + ff6w£ n n 
and 

= a(o n - o n )o n ( 1 - o n ) + f3dw { ( H 0 

( 14 ) 

When mapping features are to be extracted, it is well known 
that any a priori knowledge, such as symmetries, can effi- 
ciently improve the net performance when it is incorporated 
explicitly in the neural computation. Since the control force 
changes as F — -F when the state vector changes as Z — -Z, 
the ensemble of training data was chosento be symmetric under 
the transformation T~ = [(Z, F) — • (-Z, -F)] by randomly 
distributing the initial position and angle of the cart pole over 
the domain D X q = [-2 m, + 2 m] x [-20°, +20°]. 

Since all subintervals (/*) were treated with equal 
probability during training, there had to be enough training 
data to provide as uniform a representation of the control law 
over each I k as possible. In addition, the degree of embodi- 
ment of the control law in the training data set depended on the 
sampling rat ef s , which determined the degree of relatedness 
between the successively sampled data points [F(Z), Z]. To 
optimize F s from an information-theoretic point of view, one 
can analyze the control law in the frequency domain. If / c 
represents a cutoff frequency above which the spectral 
components of the control signal and the state vector are small 
(and can be neglected), f can be chosen to be equal to the 
Nyquist frequency, / Nyquist * 2 f c . If f s «/N yqu iso information 
relative to the features between the state vector and the control 
signal will be missing in the training data set. lif s » /N >qU iso 
the training data set will contain redundant information 
resulting in unnecessarily large memory space. The state space 
where the control law is to be learned will be bounded by the 
maximum value F max of the control force needed to bring the 
cart pole back to the origin from an arbitrary position in D xe 
and with zero initial velocities. The value of F max could be 
estimated from the training data set itself. For the parameter 
values of figure 1 and equation (6), the training data set was 
constructed by sampling 200 motions at the frequency rate f s 
of 20 Hz, over a period T of 10 sec, leading to the estimate 
~ 60 N. 

Evaluation of learning performance without noise .— To 
evaluate the learning performance, we divided the interval 
[-1, +1] of the normalized force F/F max , which represents 
the state space where the control law is to be learned, into 
N f subintervals I k (k = 1, N F ) of equal size. After the 
net was trained with a given value of the parameter Fq, the 
accuracy of the neural approximation of the control law could 
be characterized by the total mean-squared error e 2 





el 


e 2= * ■=- !- 
N f 


( 15 ) 


where el is the mean-squared error over the subinterval /*: 


n(k) 

£ [«(/) neI -«(/)‘“ rgc, | 2 

el = — l -^ 

n(k) 



( 16 ) 


In equation (16), u(i) £ rgct = F£ rget /F 0 is one of the noise- 
free data values used for training and w(/') n et = ^net/^o is the 
output of the net corresponding to the same state variable. The 
error e k is averaged over the n{k) normalized forces 
F^ rget /F max contained in f k . This definition of the error 
makes it possible to compare the learning performances 
corresponding to different values of Fq and to choose the 
optimal value of F 0 > F max for which is minimal. The 
convergence of the algorithm can be estimated from the change 
in the error e 2 (eq. (15)) as the number of iterations 
increases. Here e 1 is obtained while an attempt is being made 
to reproduce the data used for training. Similarly, the accuracy 
of feature extraction can be estimated from the magnitude of 
the error e 2 obtained while trying to predict the control forces 
corresponding to state vectors that were not presented to the 
net during training. When both errors are small, the neural net 
has developed a good internal representation of the mapping. 

Because of the statistical nature of the training, the internal 
representation of the mapping may vary from one training 
sequence to another. The reliability of the net to learn the same 
linear law from two or more training sets is therefore an 
important criterion of the neural computation and is called 
learning reliability. Learning reliability was tested by esti- 
mating the fluctuations of the total mean squared error e 2 (eq. 
(15)) over a set of 10 training sequences. For each training 
sequence, a 'Teaming error” was estimated by calculating 
equation (15) over the full training data set. Similarly, a 
"generalizing error” was estimated by calculating equa- 
tion (15) over a new training data set obtained by randomly 
generating 200 new motions of the cart pole sampled at the 
same rate and over the same period of time. 

In Figure 4, parts (a) and (b) show the mean values and 
standard deviations of learning errors and generalizing errors, 
respectively, as functions of the number of update iterations. 
After 5000 iterations, the average values of the learning error 
and generalizing error are both small, with small standard 
deviation, indicating that iterative convergence has been 
reached. Figure 5 shows the generalizing error of the percep- 
tron after 10 000 iterations for different values of the parameter 
F 0 (30, 60 (= F^), *20, and 180 N )* To maintain a similar 
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NUMBER OF ITERATIONS 


(a) Error in learning mode calculated over the training data set. 

(b) Error in generalizing mode calculated over the newly generated data 
that were not used to train the perceptron. 

Figure 4.— Estimation of mean-squared error e 2 and its standard deviation, 
in training the perceptron to learn the linear control law from noise-free 
data. Ten training sequences; control signal normalizing factor, F 0 , 60 N; 
steepest descent parameter, a, 0.2; momentum coefficient, /3, 0.9. 



NORMALIZATION FORCE, F 0 , N 

Figure 5 — Estimation of mean-squared error e 2 (generalizing mode) and its 
standard deviation versus control signal normalizing factor, F 0 , after 
training the perceptron to learn the linear control law from noise-free data 
for several values of F 0 . (Error calculated over newly generated data that 
were not used to train the perceptron.) Ten training sequences, 10 000 
iterations per sequence; steepest descent parameter, a, 0.2; momentum 
coefficient, /3, 0.9. 


respectively. For F 0 = 30 N, the error was significantly 
higher than for F 0 > F^ = 60 N, since the net can only map 
the control law over a limited region of the state space: it is 
not able to generalize it on the remainder. As expected from 
equation (12), the error decreased as F 0 departed from F max 
since the approximation of a linear control law by the 
perceptron improves as F 0 — oo. The rest of this subsection 
describes training for F 0 = F max = 60 N. 

Consequently, the learning performance of the net was tested 
in three different configurations. First, the learning open-loop 
configuration tested the ability of retrieving F from a state 
Z that was used for training (learning error). In this case, the 
net was essentially used as an analog memory. Second, the 
generalizing open-loop configuration tested the ability of the 
net to generate a control force from a state vector that was 
not used for training (generalizing error). In this case, the net 
was used as a parameter estimator. Third, the generalizing 
closed-loop configuration tested the ability of the net to 
stabilize the cart-pole process for a motion that was not used 
for training. In this case, the net was used as a real-time 
adaptive controller. 

Convergence of the training sequence occurred after 10 000 
training iterations, with a = 0.2 and (3 = 0.9. In the learning 
open-loop mode, estimates of the total mean-squared error e 2 
(eq. (15)) and its standard deviation over a random set of 10 
training sequences were 0.00017 and 0.00008, respectively. 
In the generalizing open-loop mode, estimates of e 2 and its 
standard deviation were 0.00022 and 0.00008, respectively. 
In the generalizing closed-loop mode, estimates of e 1 and its 
standard deviation were 0.00019 and 0.00008, respectively. 
These results indicate the excellent performance of the 
neural net. 

The dynamic characteristics of the cart pole controlled 
by the trained neural net were simulated for the initial state 
vectors Z(0) ~ (-0.7 m, 0, -17°, 0) and Z(0) = (1.8 m, 0, 
-17°, 0). As expected, the results obtained from the neuro- 
morphic controller and the teacher were in perfect agreement. 

Learning from noise-corrupted data.— Just as in the noise- 
free case, learning in the presence of noise was accomplished 
by constructing a representative training data sequence. How- 
ever, the presence of noise limited the degree of representation 
that could be transferred from the original data into a training 
data sequence. This can be seen by studying the ordering 
process used to construct a uniform span of the state space 
as explained in the following paragraphs. 

In the presence of noise, the normalized values of the control 
forces used as targets are no longer the exact values since 


u 


target _ 
n u ex 


+ h 


(17) 


resolution of the ordering of the training data, we organized 
the interval [-1, + 1] of the normalized forces u = F!F 0 used 
for training into subintervals of equal size (N Fq = 7 , 11,21,31) 
corresponding to F 0 = 30, 60 (= F max ), 120, and 180 N, 


However, the mapping F(Z) can still be learned by BEP if the 
training data set consisting of « „ rget 0') is representative of the 
state space of the control law. Given the statistically averaged, 
error-squared function ( \F(Z) + h F - G(Z)\ 2 } over the entire 
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state space, the BEP yields a function G(Z) that minimizes 
the error as shown variationally in equation (18): 

bi\F{7) + n F - G{Z)\ = Q ^ = F{ 2) (18) 

5G(Z) 

In the absence of noise, it is important to construct a uniformly 
distributed, training data sequence to increase the high degree 
of representation of the underlying function to be learned. This 
enables the net to reproduce the features of the function, rather 
than “memorize” the data, which is even more crucial in the 
presence of noise since the target data are not the exact values 
of the force. It is even more imperative not to learn any 
particular target data, but instead to minimize the error uni- 
formly over the entire training data set. 

Noise fluctuations tend to bias the learning through “data 
contamination” between the subintervals I Because of 
noise, target data are likely to lie in subintervals that do not 
contain their exact data counterpart. This tendency implies that, 
unless the exact data are uniformly distributed throughout the 
state space (which in practice occurs rarely), the sampling by 
subintervals as described in the previous section will result 
in a less uniform distribution as the noise increases. 


This phenomenon is illustrated in figures 6 to 8, which show 
the probability distribution function of the data actually pre- 
sented to the net during training for various noise-to-signal 
ratios, N/S = Vvar(^/F max , and for different observation 
times T of the cart-pole motions. As expected in the absence 
of noise, and because of the construction approach used, the 
data presented to the net during training (fig. 6) are, within a 
very good approximation, uniformly distributed throughout 
the state space. This uniformity does not depend on the length 
of observation (e.g., T = 0.25, 2.5, or 25 sec). 

As shown in figure 7 in the presence of noise, the probability 
distribution function becomes less uniform as T increases. As 
noted earlier, the density distribution function of the forces 
F( Z) sampled from the teacher was more peaked around the 
origin as the observation time increased. In addition, the 
number of exact data that left their subinterval I k € [-1, + 1] 
because of the noise fluctuations was proportional to the 
number of data contained in I k . As a result, the most highly 
populated subintervals increasingly populated their neighboring 
subintervals as T increased. Since the training data sequence 
was generated from the random sampling of a subinterval l k 
followed by the random sampling of a w € 4, the probability 
distribution function of a training sequence was more peaked 



SUBINTERVAL INDEX 

(a) Observation time, T, 0.25 sec. 

(b) Observation time, T, 2.5 sec. 

(c) Observation time, T , 25 sec. 

Figure 6.— Probability distribution function of noise-free data presented to the network during training with the linear control law. (Index k defines the subinterval 
I k of the state space [-1, +1] of the normalized force.) 
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SURINTERVAL INDEX 


(a) Observation time, T . 0.25 sec. 

(b) Observation time, T, 2.5 sec. 

(c) Observation time, T, 25 sec. 


Figure 7.— Probability distribution function of data that are presented to the network during training with the linear control law; noise-to-signal ratio, N/S, 
0.1. The index k defines the subinterval l k of the state space [-1. + 1 ] of the normalized force F/F ma% of the linear control law. 
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SUBINTERVAL INDEX 


(a) Observation time, T, 0.25 sec. 

(b) Observation time, T, 2.5 sec. 

(c) Observation time, T , 25 sec. 

Figure 8.— Probability distribution function of data that are presented to the network during training with the linear control taw; noise-to-signal ratio, N/S, 
0.5. The index k defines the subinterval l k of the state space [-1, +1] of the normalized force F/F mdX of the linear control law. 


as T increased. In figure 7, for N/S = 0.1, it is interesting to 
note that there is a relative depletion of the subintervals / 4 , 
1$, h, and / 8 surrounding / 6 . Here, the small N/S ratio limited 
the spreading of the data of the overpopulated subinterval / 6 
to its neighboring subintervals. Furthermore as N/S increased, 
the repopulation was no longer limited to a range of one or 
two subintervals, but it spread throughout the whole state 
space. In figure 8, for N/S = 0.5 and T= 25 sec, the prob- 
ability distribution function of the data presented to the net 
during a training sequence reduced to that of the overpopu- 
lated subinterval / 6 , which would clearly prevent effective 
learning. 

As T becomes smaller, the probability distribution of a 
training data sequence becomes more uniform. However, as 
T becomes smaller, the control law is less embodied in the 
training data set since less of the state space is included in the 
data available for training. On the other hand, long observa- 
tion times cause the noise repopulations to bias the training 
sequence by overemphasizing the contribution of the small 
amplitude motions of the cart pole: that is, F# 0. Conse- 
quently, an optimal value of T should be determined to mini- 
mize the noise contamination of the training data set and to 
maximize the degree of representation of the control law by 
the training data sequence. 

Evaluation of learning performance in the presence of 
noise— Prior to evaluating the learning performance with 
noisy training data, the boundaries of the (exact) state space 
where the control law is to be learned have to be estimated. 
In contrast to the noise-free case, where F niax can be obtained 
by direct observation of the training data set, an estimate of 
^max is best obtained by training the net for several increasing 
values of F 0 and by subsequently analyzing the boundary 
changes of the state space spanned by the net output after it 
has been presented with the data used for training. An estimate 
of F max is reached when the boundary of the state space of 
the net output does not change as F 0 increases. By using the 
same ordering technique as in the noise-free case, one can 
divide the estimated state space of the normalized control law 


F/F^ into N Fm M subintervals of equal size. To optimize the 
parameters of the computation in the presence of noise, we 
introduced an error function to estimate the degree of accuracy 
to which the neural mapping approximated the exact unknown 
mapping 


e 


2 

noise 



(19) 



In equation (20), »(i)2 =FS (i) /F 0 is one of the 
noisy data values used for training, and w(/) net = F(/) nct /F 0 
is the output of the net corresponding to the same state variable. 
After training, however, the control force F(/) net output by 
the net is expected to be closer to the exact value than the 
(noisy) target value used for training. As a result, in contrast 
to the noise-free case, a better performance evaluation is 
expected by averaging the error over the m(k) normalized 
forces F(/) net /F max contained in I k . 

In the statistical limit where m(k) — oo and for perfect 
learning, e n 0ise would be minimal and equal to var( h) /F^ x . 
For this reason, we found it more convenient to grade the 
learning performance of the net by the quantity 

-2 2 var(w) 

e noise — e noise ~pi ( 21 ) 

* max 


8 



This quantity goes to zero in the limit of perfect learning. Clearly, 
minimizing e L* with respect to the parameters of the system 
is equivalent to minimizing & noise* 

It is essential to choose small values for the steepest descent 
parameter a in order to minimize the effect of the noise 
fluctuations. For a large a, samples with large deviations tend 
to overcontribute to the adjustments of the weights, and they 
mislead the search for the minimum. For a small a, the effect 
of such deviants tends to be balanced towards the average since 
samples with small deviations occur with a higher probability. 
From a geometric point of view, the fluctuations drastically 
complicate the topology of the error surfaces by creating more 
irregularities and increasing the possibility of the net getting 
trapped in local minima or flat spots. Small values of a favor 
adiabatic changes of the weights towards paths of the energy 
surface which correspond to averaged values of the training data: 

that is, (C 8Ct ) = <«« + «> = In *® absence of noise (i e ’ 
when the target values are the exact values), the momentum term 
speeds up the convergence process by amplifying the weight 
adjustments. In the presence of noise, a momentum term would 
amplify the undesired weight changes resulting from the highly 
deviant data, which would make adiabaticity more difficult to 
maintain. For this reason, the momentum coefficient was 
chosen to be zero. The price to pay for a small steepest-descent 
coefficient and a zero momentum term is, of course, more 
iterations to reach convergence. 

At each BEP iteration, the weights (including thresholds) 
were updated through equation (14). Like with noise-free data, 
the cutoff frequency / could be estimated from the spectral 
analysis of the training data set, since the addition of white 
noise amounts to a constant shift of the power spectra. For 
jY/y = 0.1, the neural net was trained with a training data set 
of 15 000 randomly generated cart-pole motions sampled at 
f s = 20 Hz over T= 0.5 sec. Figure 9 shows the noise- 
corrupted control force of an arbitrary cart-pole motion for 
this noise-to-signal ratio. For each training sequence, the BEP 
algorithm was run for 1 million iterations with the values of 
a = 0.006, /3 = 0, and with the parameter F 0 = /w = 60 N. 

For 10 training sequences, estimates of e L se ( ec l- ( 21 )) and 
its standard deviation were 0.0037 and 0.0001, respectively. 



By direct comparison with the exact data, estimates of e 
(eq. (15)) and its standard deviation were 0.0035 and 0.0001 , 
respectively, in the generalizing open-loop mode. In the 
generalizing closed-loop mode, estimates of e 2 and its 
standard deviation were 0.0025 and 0.00006, respectively. 

The dynamic characteristics of the cart pole controlled by 
the perceptron of figure 3, trained with noisy data are illus- 
trated in figures 10 and 1 1 for Z( 0) = (-0.45 m, 0, -18 ,0) 
and Z(0) = (-1.9 m, 0, 13°, 0), respectively. The agreement 

with the teacher is excellent. 

For N/S = 0.5, the neural net was trained with a training 
data set of 35 000 randomly generated cart-pole motions 
sampled at f s = 20 Hz during a shorter period, T= 0.4 sec. 
Figure 12 shows typical noise-corrupted data of the cart-pole 
motions. In each training sequence, the BEP algorithm was 
run for 10 million iterations with the values of a = 0.002 
and 0 = 0 and with F 0 = 60 N. For 10 training sequences, 




Figure 10.— Performance of the perceptron after 1 million training iterations 
with noise-corrupted data of the linear control law; initial state vector, 
2(0) = (-0.45 m, 0, -18”, 0). Noise-to-signal ratio, N/S, 0.1; control 
signal normalizing factor, F 0 , 60 N; steepest descent parameter, a, 0.006; 
momentum coefficient, Q, 0. 
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Figure 1 1 .—Performance of the perceptron after 1 million training iterations 
with noise-corrupted data of the linear control law; initial state vector, 
Z(0) = (-1.9 m, 0, -13°, 0). Noise-to-signal ratio, N/S, 0.1; control signal 
normalizing factor, F 0 , 60 N; steepest descent parameter, a, 0.006; 
momentum coefficient, /?, 0. 



Figure 12.— Typical control forces applied by the linear teacher to stabilize 
the cart pole, and their noise-corrupted values used to train the network. 
Noise-to-signal ratio, N/S, 0.5; control signal normalizing factor 
F 0 , 60 N. 
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Figure 13.— Performance of perceptron after 10 million training iterations 
with noise-corrupted data of the linear control law; initial state vector, 
Z(0) = (1 m, 0, 3,8°, 0). Noise-to-signal ratio, N/S, 0.5; control signal 
normalizing factor, F 0 . 60 N; steepest descent parameter, a, 0.002; 
momentum coefficient, 0. 

estimates of e ~ oise and its standard deviation were 0.006 and 
0.002, respectively. By direct comparison with the exact data, 
estimates of e" (eq. (15)) and its standard deviation were 
0.006 and 0.0006 in the generalizing open-loop mode, 
respectively. In generalizing closed-loop mode, estimates 
of e 2 and its standard deviation were 0.004 and 0.0004, 
respectively. Typical curves for the cart pole controlled by 
the neural net are shown in figures 13 and 14. Even with such 
a high noise, the perceptron learned the process of returning 
the cart pole to the origin very well. 

Nonlinear Control Law 

The methodology developed for the linear controller was 
applied and tested on a nonlinear control law (ref. 4). The exact 
dynamical evolution of the cart pole (fig. 2) is given by the 
equations of motion 


6 = h 2 X 
*3 + F(Z) 


X = 


10 





Figure 14.— Performance of the perceptron after 10 million training iterations 
with noise-corrupted data of the linear control law; initial state vector, 
Z(0) = (-1.25 m, 0, -19°, 0). Noise-to-signal ratio, N/S , 0.5; control 
signal normalizing factor, F 0 , 60 N; steepest descent parameter, a, 0.006; 
momentum coefficient, (3, 0. 


where 

3 

h A = — g sin 6 
1 4 L 

3 

hj = — cos 0 

4L (22) 

h 3 = m(L sin dd 2 - 1 g sin 29) —fX 
h 4 = M + m(\ - | cos 2 0) 

The control force F(Z) was generated by applying a feedback 
linearizing and decoupling transform (ref. 10) 


F(Z) = — (h\ 4- k\6 4- k->9 4- C}X 4- c 2 X) /13 (23) 

^2 


where k x = 25, * 2 = 10, q = 1, and c 2 = 2.6. Training data 
were generated by integrating equation (22) with the initial 
condition Z( 0) = [X(0), 0, 0(0), 0] , where X(0) and 0(0) were 
an arbitrary position and angle in the domain Dxe extended 
to [-4 m, +4 m] X [-50°, +50°]. The neural architecture 
chosen to approximate the nonlinear control law F(Z) was the 
feed-forward net represented in figure 15. The input layer, 
which had four linear neurons, fanned out the continuous 
values of the state variables Z = (X, X, 6, 0) to the 16 neurons 
of the first hidden layer. Each neuron of the first hidden layer 
was connected to all four neurons of the second hidden layer. 
The neurons in the second hidden layer were all connected 
to a single neuron in the last layer. As shown in figure 15, 
each neuron input was connected to a fan-out unit that was 
permanently “on” (threshold term). The output signal of each 
neuron was modulated linearly by the (synaptic) weights before 
it excited or inhibited another connected neuron. 

The layers were labeled by the index p from 0 to 4, p = 0 
denoting the input layer. Layer p had v(p) elements consisting 
of [v(p) - 1] neurons and one fan-out unit which was 
permanently “on” and used to define the thresholds of the 
neurons of the (p 9- l) th layer. The weight connecting the i 
neuron of the p th layer to the/ h neuron of the (p 4 l) th layer 
was represented by uy (/1+ i) 7 - j/r The threshold of the/ neuron 
of the (p 4 0 th layer corresponded therefore to Wj,{p+\) ;v (p),p- 



°n 


e=l (o n -o n ) 2 






V. 


ADJUSTMENT 
OF WEIGHTS 
AND 

THRESHOLDS 


Figure 15.— Feed-forward neural network architecture with two hidden layers 
to learn the nonlinear control lawjeq. (23)) from noise-free and noise- 
corrupted data { o n = 1 1.25 4 F(7) (+ noise) /F 0 1 /2.5 
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At the n {h iteration, the weights were updated as 


+ = a °'.P A j.(p + 1 ) + (24) 

where the signal errors A* (p+2 ) at the (p + 2) th layer were 
back propagated to the ( p + l) th layer to give the signal error 
A j.ip + i) ^ the ( p + l) th layer 


A j. (p + 1 ) °j, (p + 1 ) 


“ °j.(p+ 1 ) 


v{p+ 2) + I 

Yj A k.(p + 2) W k,(p + 2);j(p+\) 

- k= 1 


(25) 


If d n is the network output, and o n the (noise-free or noise- 
corrupted) target output, the error signal A M at the output 
layer is the gradient of the error given (as in eq. (14)) by 

A K4 = (o n - d n ) o n ( 1 - d n ) (26) 

In contrast to the linear case where perfect learning was 
obtained in the limit F 0 oo, the only way to find the optimal 
value of F 0 for a nonlinear control law is to train the net for 
several values of F 0 larger than F max and compare the 
learning performances. 

Learning from noise-free data.— Owing to the symmetry 
of the problem, it is assumed that the control force F(Z) 
changes as F — - F when the state vector changes as Z — - Z. 
Therefore, the ensemble of training data was chosen to be 
symmetric under the transformation T~ = [(Z, F) 
— (-Z, - F)] by randomly distributing the initial position and 
angle of the cart pole over the domain D Xd = [-4 m, +4 m] 
X [-50°, +50°]. 

In the general case of a nonlinear control law, a spectrum 
analysis of the training data set provides only a gross estimate 
of the cutoff frequency. This value could be used as an 
educated guess for the optimization of f s in minimizing the 
error e 2 (eq. (15)). 

An upper bound for the control force needed to bring the 
cart pole back to the origin from a position in D xe , and 
without initial velocities, is F max = 120 N. In this section, 
training was performed over 200 motions sampled atf s = 20 
Hz during T ~ 10 sec. The steepest descent coefficient and 
the momentum term of the BEP algorithm were a = 0.2 and 
0 = 0.9. In learning the mapping F(Z) (eq. (23)) and using 
it to control the cart pole, the neuromorphic controller was 
able to stabilize the pole to 0 = 0, but it would occasionally 
return the cart to an equilibrium position fluctuating in the 
vicinity of X = 0. To circumvent this numerical difficulty due 
to the existence of a local minimum or flat spot, we fine-tuned 
the controller by augmenting the training with data randomly 


sampled in the subinterval / 6 , which is symmetric under the 
T~ transformation. 

The mean-squared error was calculated for several values 
of F 0 > over a random set of 10 training sequences, 
which consisted of 99 000 gross-tuning iterations and 1000 
fine-tuning iterations. Analyzing the error as a function of F 0 
indicated that a minimum was reached for F 0 = F max . With 
F 0 = 120 N, estimates of the total mean-squared error e 2 in 
the learning open-loop mode (eq. (15)) and its standard 
deviation were 0.0005 and 0.00017, respectively. In the 
generalizing open-loop mode, estimates of e 2 and its standard 
deviation were 0.00084 and 0.00037, respectively. In the 
generalizing closed-loop mode, estimates of e 2 and its 
standard deviation were 0.0007 and 0.00032. The results of 
the computation are shown in figures 16 and 17 for the initial 
state vectors Z(0) = (-1 m, 0, 45°, 0) and Z(0) = (3 m, 0, 





Figure 16.— Performance of the four-layer neural network after 100 000 
training iteraftons with noise-free data of the nonlinear control law; initial 
state vector, Z(0) = (-1 m, 0, 45°, 0). Control signal normalizing factor, 
F 0 , 120 N; steepest descent parameter, a, 0.2; momentum coefficient, /3, 0.9. 
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Figure 17.— Performance of four-layer neural network after 100 000 training 
iterations with noise-free data of the nonlinear control law; initial state 
vectors, Z( 0) = (3 m, 0, -35°, 0). Control signal normalizing factor, 
Fq, 120 N; steepest descent parameter, oc, 0.2; momentum coefficient, /3, 0.9. 


-35°, 0), respectively. The neural net was able to return the 
cart pole very satisfactorily to the origin from large angles 
and large displacements. This demonstrates that the internal 
representation that the net developed during training is a very 
good approximation of the mapping defined in equation (23). 

Learning from noise-corrupted data. — The mapping to be 
extracted and learned by the net was assumed to be symmetric 
with respect to the T transformation, and the initial position 
and angle of the cart pole were also randomly distributed over 
D xe . Towards the end of the training, the net was fine-tuned 
by training it with data sampled only from the subinterval / 6 
symmetric with respect to T . Training data were generated 
by integrating the equations of motion (eq. (22)) and adding 
to the control force a noise normally distributed around zero. 

The probability distribution functions of the data that were 
actually presented to the net of figure 15 during training were 
plotted for different observation times T and various noise- 
to-signal ratios. The characteristics of these probability densi- 
ties are the same as those of the probability densities of the 
linear control law plotted in figures 6 to 8. For large values 
of 7, the noise contamination of the sampled data prevented 
the construction of a uniformly distributed training data 
sequence. For small values of 7, the data sampled from cart- 


pole motions were not sufficiently representative of the control 
law. Consequently, the mean-squared error 4oise(eq. O 9 )) 
was expected to be minimum for intermediate values of T. This 
is demonstrated in figure 18(a) for a noise-to-signal ratio N/5 
of 0.1. For comparison, the exact mean-squared error e 
(eq. (15)) is plotted in figure 18(b) as a function of T. In fig- 
ure 18, the errors were computed over a pool of data sampled 
from cart-pole motions over 7 max = 100 sec, after training 
the net from data sampled from cart-pole motions over 
T < 7 For larger noise-to-signal ratios, these “wells 1 ’ 
would be more pronounced since the errors would significantly 
increase for larger values of T because of noise contamination. 

With a noise-to-signal ratio of 0.1, training data were 
generated by sampling 1000 cart-pole motions at ^ — 20 Hz 
during 7=5 sec. With this set of parameters and following 
the approach described in the section Learning from noise- 
corrupted data , an upper bound for the (exact) state space of 
the nonlinear control law was 7 max = 120 N. The intensity of 
the noise is shown in figure 19. The convergence of the training 
is illustrated in figure 20, which shows € noise i ^ noise 
e * ± be 2 as functions of the number of BEP iterations. A 
training sequence consisted of 490 000 gross-tuning iterations 
followed by 10 000 fine-tuning iterations, with the 




Figure 18.— Estimations of mean-squared errors i^ jse and e and standard 
deviations after training the four-layer neural network to learn the nonlinear 
control law with a noise-to-signal ratio, N/S , of 0.1. The statistics refer 
to a set of five training sequences of 500 000 BEP iterations. Steepest descent 
parameter, a, 0.02; momentum coefficient, /?, 0; control signal normalizing 
factor, Fq , 120 N. 
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Figure 19.— Typical control forces applied by the nonlinear teacher to stabilize 
the cart pole and their noise-corrupted values used to train the neural network. 
Noise-to-signal ratio, N/S , 0.1; control signal normalizing factor, 
F 0 , 120 N. 




(a) Convergence of error due to noise as function of BEP iterations, 
(b) Convergence of error as function of BEP iterations. 

Figure 20. — Iterative convergence of the four-layer neural network in learning 
the nonlinear control law from data corrupted with a noise-to-signal ratio, 
N/S, of 0.1. (Same qualitative evolution of e'F and e 2 with respect to 
the number of training iterations.) 

parameter values a = 0.02, 0 = 0, and F 0 = F max . For a set 
of five training sequences, estimates of e 2 noisc and its standard 
deviation were 0.013 and 0.0003, respectively. By direct 
comparison with the exact data, estimates of e 2 (eq. (15)) and 
its standard deviation were 0.0068 and 0.0004 in the 
generalizing open-loop mode. The dynamic characteristics of 


the cart pole controlled by the net of figure 15 are shown in 
figures 21 and 22 for Z(0) = (3 m, 0, -35°, 0) and 
Z(0) = (-1 m, 0, 45°, 0), respectively. 

For N/S = 0.2, 4000 cart-pole motions (fig. 23) were 
sampled at/ = 20 Hz during T = 2 sec to train the net. The 
BEP parameters were a = 0.01 and 0 = 0, and 950 000 gross- 
tuning and 50 000 fine-tuning iterations were used in each 
training sequence with F 0 = F max . For five training 
sequences, estimates of ^ oise and its standard deviation were 
0.016 and 0.001, respectively. By direct comparison with the 
exact data, estimates of e 2 (eq. (15)) and its standard devi- 
ation were 0.009 and 0.0008 in the generalizing open-loop 
mode. The dynamic characteristics of the cart pole con- 
trolled by the neural net are shown in figures 24 and 25 for 
m = (3 m, 0, -35°, 0) and Z(0) = (-1 m, 0, 45°, 0), 
respectively. For these two examples, the learning perfor- 
mance could be further enhanced by minimizing ^ oise with 
respect to the scaling factor, F 0 > F max , and the sampling 
rate/,. 

These results show that, in spite of a significant amount of 
noise, the net was able to learn the control law (eq. (23)) within 
sufficient accuracy to return the cart pole satisfactorily to its 
equilibrium position. 
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Figure 21.— Performance of the four-layer neural network after 500 000 
training iterations with noise-corrupted data of the linear control law; initial 
state vector, Z(0) = (3 m, 0, —35°, 0). Noise-to-signal ratio, N/S , 0.1; 
control signal normalizing factor, F 0 , 120 N; steepest descent parameter, 
a, 0.02; momentum coefficient, 0, 0. 
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Figure 22.— Performance of the four-layer neural network after 500 000 
training iterations with noise -corrupted data of the linear control law; initial 
state vector, Z( 0) = (-1 m, 0, 45°, 0). Noise-to-signal ratio, N/S , 0.1; 
control signal normalizing factor, Fq, 120 N; steepest descent parameter, 
a, 0.02; momentum coefficient, /3, 0. 



Figure 23.— Typical control forces applied by the nonlinear teacher to stabilize 
the cart pole, and their noise-corrupted values used to train the neural 
network. Noise-to-signal ratio, N/S , 0.2; control signal normalizing factor, 
F 0 , 120 N. 
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Figure 24.— Performance of four-layer neural network after 1 million training 
iterations with noise-corrupted data of the nonlinear control law; initial state 
vector, Z(0) = (3 m, 0, -35°, 0). Noise-to-signal ratio, N/S, 0.2; control 
signal normalizing factor, F 0 , 120 N; steepest descent parameter, a, 0.01; 
momentum coefficient, 0,0. 
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TIME, T, sec 

Figure 25.— Performance of four-layer neural net after 1 million training iterations with noise-corrupted data of the nonlinear control law; initial state 
vector, Z( 0) = (-1 m, 0, 45°, 0). Noise-to-signal ratio, N/S , 0.2; control signal normalizing factor, F 0 , 120 N; steepest descent parameter, c*, 0 01- 
momentum coefficient, 0,0. 


Example of Neuromorphic Learning 
of Nonlinear Control With Measurement 
Noise and Data Processing Noise 

The ability of the feed-forward net to map a nonlinear control 
taw was tested further by introducing the training architecture 
shown in figure 26. Here the dynamics of the cart pole 
controlled by the teacher were corrupted by noise. In addition, 
the representation of the cart-pole motions transmitted to the 
net was corrupted because of a noisy data communication link. 

In the first phase of collecting the training data, cart-pole 
motions were generated from various initial positions randomly 


distributed over D xe = [-4 m, +4 m] X [-50°, +50°]. If the 
four-dimensional vector n£ represents the noises associated 
with the measurement of the actual state vector Z a , the value 
of the state vector Z s passed to the teacher is 

Z s = Z a + n£ (27) 

Although the teacher knows the exact transfer function 
F[Z(t)] y noise may occur during the physical application of 
the force and create a discrepancy between the force applied 
by the actuator, F° and the desired force F(Z ), 

F° = F(Z) + n a F (28) 



Figure 26. General training architecture for neuromorphic learning with noisy sensor measurements (noise within control loop) and noisy data processing 
(noise on the data communication links). 
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In contrast to the training architecture of figure 1 , for the 
architecture of figure 26 noise was included within the control 
loop itself. As a result, for the same initial conditions, the 
architectures of figures 1 and 26 led to different motions of 
the cart pole. In this numerical application, an N/S of 0.02 
was chosen for the fluctuations of hp , n x , n 0 , and n 0 . 
Figure 27(a) shows the normalized control force actually 
applied by the teacher-controller to the cart pole for the initial 
state vector Z( 0) = (-1 m, 0, 45°, 0). The actual values of 
the corresponding position X a (t) and angle d a (t) are shown 
in figures 27(b) and (c), together with the noise fluctuations 
n s x and h s Q . Their comparison with the noise-free trajectories 
of the cart pole, as given by the solid lines of figure 16, shows 
the existence of small low-frequency oscillations around the 
equilibrium position, especially for 0 . 

In the second phase of collecting the training data, the values 
of the force F° applied to the cart pole and the values of the 
state variables Z s measured by the sensors were stored off- 
line through data communication links. Noise that mayjoccur 
during the analog signal processing of the state vector Z s was 
simulated by a four-dimensional vector fi^ s normally 
distributed around zero 


graining (,) = Z S (t) + fl §* ( 29 ) 

The noise that may be added to the force during the data 
processing was simulated by a normal distribution hf a 

^training ^ = £>(,) + ( 30 ) 

In this simulation, N/S = 0.04 was chosen for the noise 
fluctuations incurred during data processing: that is. 


Vvar (np^_ \lvdx(ffy*) _ 

/W maxp a \ 

The effects of such measurement noise and data processing 
noise of the values of the force, and state vector, 

^training ^ use( j f or t ra i n ing are illustrated in figure 28 for the 

initial condition Z(0) = ( — 1 m, 0, 45°, 0). 

In the third phase, the noise-corrupted data were used to 
train the neural net according to the method developed in 
the previous section. After 1 million training iterations with 
a = 0.01 and 0 = 0 (requiring 1 hr of VAX 8800 CPU time), 
the performance of the net was compared with the performance 
of the teacher. Figure 29(a) shows the force applied by the 
neural net controller to return the cart pole to its equilibrium 
position from the initial state Z(0) = (-1 m, 0, 45 ,0). For 
that motion, the values of the position X a (f) and angle 6 a (t), 
measured by the sensors, are shown in figure 29(b) and (c) 
together with the actual values X a (t) and 0 a (t). A com- 


parison of figure 27(b) and (c) and with figure 29(b) and (c) 
shows that the motion of the cart pole was smoother, and more 
stable, when it was controlled by the net than when it was 
controlled by the teacher. These findings are illustrated further 
in figure 30 (teacher) and figure 31 (neural net) for the initial 
state vector Z(0) = (3 m, 0, -35°, 0). (Compare, in particular, 
figs. 30(c) and 31(c).) 

In effect, the statistical nature of the procedure used to 
construct a training data sequence allows the net to overcome 
noise fluctuations, and to return the cart pole to its equilibrium 
position. A very interesting characteristic of the neurocon- 
troller is the attenuation of the low-frequency fluctuations 
created by the presence of noise within the control loop. 



(a) Normalized control force actually applied by the teacher-controller to the 
cart pole as function of time. 

(b) Cart position as function of time. 

(c) Pendulum angle as function of time. 

Figure 27. — Dynamics of cart pole controlled by the nonlinear teacher in the 
architecture of figure 26 and operating in the presence of noise; initial state 
vector, Z( 0) = (-1 m, 0, 45°, 0). Sensor noise-to-signal ratio, (N/S) xnsoVi , 
0.2; control signal normalizing factor, F 0 , 120 N. 
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TIME,T, sec (c) Pendulum angle as function of time. 

(a) Force to return cart pole to its equilibrium position from its initial state. Figure 30.— Dynamics of cart pole controlled by^the teacher and operating 

(b) Cart position as function of time. in t h e presence of noise; initial state vector, Z( 0) = (3 m, 0, -35°, 0). 

(c) Pendulum angle as function of time. Noise-to-signal ratio, (WS) sensors , 0.02; control signal normalizing factor. 

Figure 29. -Dynamics of cart pole controlled by the neural network, trained F 0 , 120 N. 

from the noise-corrupted data in figure 26, and operating in the presence 
of noise after 1 million BEP iterations; initial state vector, Z(0) = (-1 m, 

0, 45°, 0). Sensor noise-to-signal ratio, (N/S)^ nsors , 0.02; control signal 
normalizing factor, F 0 , 120 N; steepest descent parameter, a, 0.01; 
momentum coefficient, (3, 0. 
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(a) Force to return cart pole to its equilibrium position from its initial state, 

(b) Cart position as function of time. 

(c) Pendulum angle as function of time. 

Figure 3 1 .—Dynamics of cart pole controlled by the neural network, trained 
from the noise-corrupted data in figure 26, and operating in the presence 
of noise, after 1 million BEP iterations; initial state vector, Z( 0) = 
(3 m, 0,-35 ,0). Noise-to-signal ratio, (A7S) scnsors , 0.02; control signal 
normalizing factor, F 0 . 120 N; steepest descent parameter, a, 0.01; 
momentum coefficient, 0. 
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of the original data. In the absence of noise, uniform distribu- 
tions can be constructed by randomly sampling subintervals 
of the state space. However, the presence of noise mixes the 
populations of these subintervals, limiting the uniformity of 
the data distribution in the state space. Application of the varia- 
tional principle to an error function determines the parameter 
values that minimize the effect of this noise contamination 
while maximizing the degree of embodiment of the mapping 
(control law) in the training data set. 

Topics of further research would include the use of a time- 
dependent sampling rate and the use of neuromorphic classi- 
fiers in order to reduce the effect of the noise contamination 
and improve the neuromorphic learning of control laws. In 
the first case, decreasing the sampling rate as the controlled 
process returns to its equilibrium position would favor a more 
uniform probability distribution of the exact training data set, 
and thereby a more uniform distribution of the training se- 
quence. In the second case, a neural architecture could be used 
as a noise-filtering preprocessor to selectively construct a 
training sequence that would be more uniformly distributed. 

The neural computation was not only found to filter the 
noises and allow the neurocontroller to substitute for the 
teacher-controller, but also to provide a smoother mode of 
operation than the teacher. This means, for example, that a 
neural net could be trained off-line by observing a conventional 
digital microprocessor perform control operations in a very 
noisy environment and could be used as a substitute neuro- 
controller to improve the quality of the control in terms of 
stability. Besides the multiple advantages of the neural 
computation and of its practical implementation discussed in 
the introduction, the latter property may lead to new noise- 
filtering applications for neural network technology. 


Conclusions 

These results demonstrate that neural networks can 
satisfactorily extract a continuous-valued, nonlinear mapping 
between input and output data even when the training data are 
corrupted by measurement noise and data processing noise. 
In order to learn such a mapping (control law) on a certain 
region of the state space, the training data set must have the 
features that represent the mapping (control law) on the same 
region. In addition, the data of the training set must be ordered 
by magnitude to generate a training data sequence that is 
uniformly and randomly distributed throughout the state space 


Acknowledgments 

We express our gratitude to Carl Lorenzo for suggesting 
the analysis discussed in the section Example of Neuro- 
morphic Learning of Nonlinear Control With Measure- 
ment Noise and Data Processing Noise. We are also grateful 
to Michelle Bright for suggesting the discussion of the control 
force normalization constant. 

Lewis Research Center 

National Aeronautics and Space Administration 
Cleveland, Ohio, December 5, 1989 


20 



Appendix— Symbols 


c 

applied control signal 

Dxe 

domain of initial position and angle of the 
cart pole 

e 2 

total mean-squared error between targets 
and neural net outputs 

e 2 . 
c noise 

total mean-squared error between noisy 
targets and neural net outputs 

e 2 . 
c noise 

normalized expression for e noise g°i n g t0 
zero in perfect learning 

F 

control force applied to cart 

F 

1 max 

range of variations of the control signal 
(force) 

^0 

constant that normalizes the control signal 
over [-1, +1] 


force applied to the cart pole by the 
actuator in the presence of noise 

fc 

cutoff frequency above which the spectral 
components of the control signal and the 
state vector are small 

fs 

sampling rate of control signal and state 
variables 

G(Z) 

variable of an error function 

g 

acceleration due to gravity 

/l 1 , /?2 » ^3 * ^4 

auxilliary expressions for the nonlinear 
dynamics of the cart pole 

h 

subinterval of [-1, +11 

*net 

total input signal of a neuron 

k 

coefficients of various parameters 

L 

distance between the base of the pole and 
the center of gravity of the pole 

M 

mass of cart 

m 

mass of pole 

N f 

number of subintervals of the control force 


normalized in [-1, +1] 

1 y motions 

number of observed responses of the cart 
pole to random displacements from its 
equilibrium position 

h 

noises (independent, normally distributed, 
zero mean processes) 

on 

higher order terms 

O n 

target output 

O n 

network output 

P 

p lh layer 


So 

5 ] 

T 


k 

u 

var 

w 

Wth 


X 

X 

X 

z 

z s 

a 

P 


V 


6 

v(p) 

<t> 


entire region of the state space of the 
control law 

region of the state space of the control law 
that is covered by the training data set 

length of observation time of a cart-pole 
response 

times at which samples are taken 
normalized control force applied to cart 
(F/F 0 ) 
variance 

“synaptic” weight connecting two neurons 
“synaptic” weight connecting a neuron to a 
neuron that is permanently “on” (defines 
its threshold) 

axial location of the cart pole 
linear velocity of cart pole 
linear acceleration of the cart 
state vector of the controlled process 
state vector measured by sensors 
steepest descent parameter 
momentum coefficient of the steepest 
descent 

signal error backpropagated to the / ch 
neuron of the p th layer 
error between target and neural net output 
force of friction acting on cart 
angle of pole displacement from the vertical 
angular velocity of the pole 
angular acceleration of the pole 
number of neurons of the p th layer 
exact mapping to be learned by the neural 
net 


Subscripts: 

ex values that the state vector or control signal 

would have in the absence of noise 

(different from actual value due to the 
nonlinear effects of measurement noise) 

i z th neuron 

j j th neuron 

k value over the subinterval I k 

n th iteration number 


21 



Superscripts: 

a actual value 

c data communication 

5 sensor measurement 


target value to be matched as closely as possible 

by the neural net 

training value used to train the network 

” normalized value 
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